Skip to content

Error Handling: refactor ExecuteComputation and ExecuteReplicated to propagate status. #9445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 29, 2025

Conversation

ysiraichi
Copy link
Collaborator

@ysiraichi ysiraichi commented Jul 3, 2025

This PR refactors both ComputationClient:::ExecuteComputation and ComputationClient:::ExecuteReplicated functions for propagating error statuses.

Key Changes:

  • Refactors ExecuteComputation and ExecuteReplicated: Now explicitly propagates absl::Status.
  • Replaces Raw .value() Calls: Uses XLA_ASSIGN_OR_RETURN for propagating the status.
  • Updates Call Sites: Leverages GetValueOrThrow for more robust error management.

@ysiraichi

This comment was marked as outdated.

@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-eager-mode branch 2 times, most recently from bb72d7f to 8f1ba5e Compare July 4, 2025 16:24
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-errors branch 3 times, most recently from 34abde6 to 103cd0f Compare July 24, 2025 12:40
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-eager-mode branch from 8f1ba5e to d3e373e Compare July 25, 2025 14:53
@ysiraichi ysiraichi changed the base branch from ysiraichi/status-for-oom-errors to master July 25, 2025 15:11
@ysiraichi ysiraichi marked this pull request as ready for review July 28, 2025 12:46
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-eager-mode branch from 50f33b9 to 4c3616a Compare July 28, 2025 12:48
Copy link
Collaborator

@zhanyong-wan zhanyong-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Key changes:
- Updated base `ComputationClient` interface to return `absl::StatusOr<std::vector<DataPtr>>`
- Modified IFRT and PjRt implementations to use proper error propagation
- Replaced raw `.value()` calls with `XLA_ASSIGN_OR_RETURN_WITH_LOCATION` macros
- Updated all call sites to use `GetValueOrThrow` for exception-based error handling
@ysiraichi ysiraichi force-pushed the ysiraichi/status-for-oom-eager-mode branch from 04bb0d4 to f3f4d78 Compare July 29, 2025 15:07
@ysiraichi ysiraichi merged commit b0ffc49 into master Jul 29, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants